| Non-Rationalised Economics NCERT Notes, Solutions and Extra Q & A (Class 9th to 12th) | |||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 9th | 10th | 11th | 12th | ||||||||||||||||
Chapter 6 Measures Of Dispersion
1. Introduction
While measures of central tendency, like the arithmetic mean, provide a single representative value for a dataset, they do not tell the whole story. Averages can be misleading because they hide the variability or dispersion present in the data. Dispersion refers to the extent to which the values in a distribution differ from the average or from each other.
Consider the incomes of three families, all with an average family income of $\textsf{₹}$15,000:
| Member | Ram's Family (₹) | Rahim's Family (₹) | Maria's Family (₹) |
|---|---|---|---|
| 1 | 12,000 | 7,000 | 0 |
| 2 | 14,000 | 10,000 | 7,000 |
| 3 | 16,000 | 14,000 | 8,000 |
| 4 | 18,000 | 17,000 | 10,000 |
| 5 | - | 20,000 | 50,000 |
| 6 | - | 22,000 | - |
| Average Income | 15,000 | 15,000 | 15,000 |
Although the average income is the same for all three families, the distribution of income within each family is vastly different. In Ram's family, the incomes are closely clustered around the average, indicating low dispersion. In Rahim's family, the incomes are more spread out, and in Maria's family, the variation is extremely high due to one member's very large salary. This shows that knowing only the average is insufficient; we also need a measure that quantifies the spread or variability of the data.
Measures of dispersion help us understand concepts like income inequality and provide a more complete picture of a dataset. The main measures of dispersion are:
- Range
- Quartile Deviation
- Mean Deviation
- Standard Deviation
Additionally, the Lorenz Curve is a graphical method used to estimate dispersion and inequality.
2. Measures Based Upon Spread Of Values
These measures quantify dispersion by calculating the spread within which the data values lie.
Range
The Range (R) is the simplest measure of dispersion. It is the difference between the largest (L) and the smallest (S) value in a dataset.
$R = L - S$
A larger range implies greater dispersion, while a smaller range indicates less dispersion.
Range: Comments
The main limitation of the range is that it is unduly affected by extreme values (outliers). It only considers the two most extreme values and ignores the distribution of all other observations in between. Because of this, it is not considered a very reliable measure of dispersion. However, its simplicity makes it easy to understand and use, such as when we look at the maximum and minimum daily temperatures.
Quartile Deviation
To overcome the problem of extreme values, we can use a measure based on the middle 50% of the data. The Interquartile Range is the difference between the third quartile ($Q_3$) and the first quartile ($Q_1$).
$\text{Interquartile Range} = Q_3 - Q_1$
The Quartile Deviation (Q.D.), also known as the Semi-Interquartile Range, is half of the interquartile range.
$Q.D. = \frac{Q_3 - Q_1}{2}$
Since Q.D. is based on the central 50% of the data, it is not affected by extreme values, making it a more stable measure of dispersion than the range. It can also be calculated for open-ended frequency distributions.
Calculation of Range and Q.D. for Ungrouped Data
Example 1. Calculate range and Q.D. for the following observations: 20, 25, 29, 30, 35, 39, 41, 48, 51, 60, 70.
Answer:
Range (R):
$R = L - S = 70 - 20 = 50$
Quartile Deviation (Q.D.):
The data is already in ascending order. Here, n = 11.
Position of $Q_1 = (\frac{n+1}{4})^{th}$ value = $(\frac{11+1}{4})^{th}$ value = 3rd value. So, $Q_1 = 29$.
Position of $Q_3 = 3(\frac{n+1}{4})^{th}$ value = $3(\frac{12}{4})^{th}$ value = 9th value. So, $Q_3 = 51$.
$Q.D. = \frac{Q_3 - Q_1}{2} = \frac{51 - 29}{2} = \frac{22}{2} = 11$
Calculation of Range and Q.D. for a Frequency Distribution
For continuous data, the range is the difference between the upper limit of the highest class and the lower limit of the lowest class.
To calculate Q.D. for a continuous frequency distribution, we first find the classes where $Q_1$ and $Q_3$ lie using the positions $(\frac{n}{4})^{th}$ and $(\frac{3n}{4})^{th}$ respectively. Then, we use the following interpolation formulas:
$Q_1 = L + \frac{(\frac{n}{4} - c.f.)}{f} \times i$
$Q_3 = L + \frac{(\frac{3n}{4} - c.f.)}{f} \times i$
Where L is the lower limit of the quartile class, c.f. is the cumulative frequency of the preceding class, f is the frequency of the quartile class, and i is the class interval.
3. Measures Of Dispersion From Average
These measures quantify dispersion by calculating the average extent to which individual values deviate from a central value (usually the mean or median).
Mean Deviation
The Mean Deviation (M.D.) is the arithmetic mean of the absolute deviations of the observations from their central value (mean or median). By taking the absolute value of the deviations (ignoring the negative signs), we avoid the problem where the sum of deviations from the mean is always zero.
Formulas for Mean Deviation:
- For Ungrouped Data:
M.D. from Mean = $\frac{\sum |X - \bar{X}|}{n}$
M.D. from Median = $\frac{\sum |X - Median|}{n}$
- For Grouped Data:
M.D. from Mean = $\frac{\sum f|m - \bar{X}|}{\sum f}$
M.D. from Median = $\frac{\sum f|m - Median|}{\sum f}$
Mean Deviation: Comments
Mean deviation is based on all observations, so a change in any value will affect it. A key property is that the mean deviation is at its minimum when calculated from the median. However, its main drawback is that ignoring the signs of deviations is considered unmathematical, which limits its use in more advanced statistical analysis.
Standard Deviation
The Standard Deviation ($\sigma$) is the most widely used and most important measure of dispersion. It is defined as the positive square root of the arithmetic mean of the squared deviations of the observations from their mean. The square of the standard deviation is called the Variance ($\sigma^2$).
Squaring the deviations overcomes the problem of their sum being zero, and it gives more weight to larger deviations, making it a sensitive measure of variability.
Calculation Of Standard Deviation
There are several methods to calculate standard deviation, all yielding the same result.
For Ungrouped Data:
- Actual Mean Method: $\sigma = \sqrt{\frac{\sum (X - \bar{X})^2}{n}}$
- Assumed Mean Method: $\sigma = \sqrt{\frac{\sum d^2}{n} - (\frac{\sum d}{n})^2}$ where $d = X - A$
- Step-Deviation Method: $\sigma = \sqrt{\frac{\sum (d')^2}{n} - (\frac{\sum d'}{n})^2} \times c$ where $d' = \frac{X - A}{c}$
For Grouped (Continuous) Data:
- Actual Mean Method: $\sigma = \sqrt{\frac{\sum f(m - \bar{X})^2}{\sum f}}$
- Assumed Mean Method: $\sigma = \sqrt{\frac{\sum fd^2}{\sum f} - (\frac{\sum fd}{\sum f})^2}$ where $d = m - A$
- Step-Deviation Method: $\sigma = \sqrt{\frac{\sum f(d')^2}{\sum f} - (\frac{\sum fd'}{\sum f})^2} \times c$ where $d' = \frac{m - A}{c}$
Standard Deviation: Comments
Standard deviation is considered the best measure of dispersion because:
- It is based on all observations.
- It has desirable mathematical properties, making it suitable for further statistical analysis.
- It is well-defined and not subject to ambiguity.
- It is independent of origin (unaffected by adding or subtracting a constant from all values) but not independent of scale (it is affected by multiplying or dividing all values by a constant).
4. Absolute And Relative Measures Of Dispersion
All the measures discussed so far (Range, Q.D., M.D., S.D.) are absolute measures of dispersion. They express the variation in the same units as the original data (e.g., if income is in rupees, the standard deviation will also be in rupees). This makes it difficult to compare the variability of two different datasets, especially if they have different units or significantly different means.
To overcome this, we use relative measures of dispersion. These are unit-free ratios or percentages that allow for meaningful comparison.
| Absolute Measure | Relative Measure | Formula |
|---|---|---|
| Range | Coefficient of Range | $\frac{L - S}{L + S}$ |
| Quartile Deviation (Q.D.) | Coefficient of Quartile Deviation | $\frac{Q_3 - Q_1}{Q_3 + Q_1}$ |
| Mean Deviation (M.D.) | Coefficient of Mean Deviation | $\frac{M.D._{Mean}}{\bar{X}}$ or $\frac{M.D._{Median}}{Median}$ |
| Standard Deviation ($\sigma$) | Coefficient of Variation (C.V.) | $\frac{\sigma}{\bar{X}} \times 100$ |
The Coefficient of Variation (C.V.) is the most commonly used relative measure. A lower C.V. indicates greater consistency or stability in the data, while a higher C.V. implies greater variability.
5. Lorenz Curve
The Lorenz Curve is a graphical measure of dispersion used to show inequalities in distribution, particularly for income and wealth. It plots the cumulative percentage of a variable (like income) against the cumulative percentage of the population that holds it.
Construction Of The Lorenz Curve
- Data on population (or number of employees) and the corresponding variable (like income) are arranged in classes.
- Cumulative frequencies and cumulative values of the variable are calculated.
- These cumulative values are converted into percentages of their respective totals.
- The cumulative percentages of the population are plotted on the horizontal (X) axis, and the cumulative percentages of the variable are plotted on the vertical (Y) axis.
Studying The Lorenz Curve
A straight diagonal line is drawn from (0,0) to (100,100). This is called the Line of Equal Distribution. It represents a situation of perfect equality (e.g., the bottom 20% of the population earns 20% of the income).
The actual Lorenz curve will be a curved line below the line of equal distribution. The farther the Lorenz curve is from the line of equal distribution, the greater the inequality in the distribution. This graphical tool is very effective for visually comparing the degree of inequality between different distributions.
6. Conclusion
Measures of dispersion are essential supplements to measures of central tendency, as they provide crucial information about the variability or spread of data. Each measure has its unique characteristics:
- Range is simple but is highly affected by extreme values.
- Quartile Deviation is better as it is not affected by outliers, focusing on the middle 50% of the data.
- Mean Deviation and Standard Deviation are based on the deviations of all values from the average.
- Standard Deviation is the most widely used and statistically robust measure of dispersion.
When comparing different datasets, relative measures of dispersion like the Coefficient of Variation should be used. For a visual representation of inequality, the Lorenz Curve is an invaluable tool.
Recap
- A measure of dispersion quantifies the variation in a dataset, providing a more complete understanding than a central value alone.
- Range and Quartile Deviation are based on the spread of values, while Mean Deviation and Standard Deviation are based on deviations from an average.
- Absolute measures are expressed in the same units as the data, while relative measures are unit-free and suitable for comparisons.
- The Lorenz Curve is a graphical tool used to estimate and visualize the degree of inequality in a distribution.
Exercises
This section contains questions for practice and self-assessment, designed to test the learner's understanding of the concepts discussed in the chapter, such as choosing the appropriate measure of dispersion, calculating different measures for various datasets, and comparing the variability of different series using relative measures.